Incident Threading in News
نویسندگان
چکیده
With an overwhelming volume of news reports currently available, there is an increasing need for automatic techniques to analyze and present news to a general reader in a meaningful and efficient manner. Previous research has focused primarily on organizing news stories into a list of clusters by the main topics that they discuss. We believe that viewing a news topic as a simple collection of stories is restrictive and inefficient for a user hoping to understand the information quickly. As a proposed solution to the automatic news organization problem, we introduce incident threading in this thesis. All text that describes the occurrence of a real-world happening is merged into a news incident, and incidents are organized in a network with dependencies of predefined types. In order to simplify the implementation, we start with the common assumption that a news story is coherent in content. In the story threading system, a cluster of news documents discussing the same topic are further grouped into smaller sets, where each represents a separate news event. Binary links are established to reflect the contextual vi information among those events. Experiments in story threading show promising results. We next describe an enhanced version called relation-oriented story threading that extends the range of the prior work by assigning type labels to the links and describing the relation within each story pair as a competitive process among multiple options. The quality of links is greatly improved with a global optimization process. Our final approach, passage threading, removes the story-coherence assumption by conducting passage-level processing of news. First we develop a new testbed for this research and extend the evaluation methods to address new issues. Next, a calibration study demonstrates that an incident network helps reading comprehension with an accuracy of 25-30% in a matrix comparison evaluation. Then a new three-stage algorithm is described that identifies on-subject passages, groups them into incidents, and establishes links between related incidents. Finally, significant improvement over earlier work is observed when the training phase optimizes the harmonic mean of various evaluation measures, and the performance meets the goal in the calibration study.
منابع مشابه
Investigating Characteristics of Hospital Building Fires in Iran
Background: Building fires are the most common threatening and distressing hazard in hospitals. Fire is one of the top 10 hazards that threaten Iranian hospitals. Nevertheless, no study has been done on the features of fires in Iran hospitals. So, this study aimed to investigate the characteristics of fires taking place in Iranian hospitals. Materials and Methods: In this cross-sectional stu...
متن کاملTopic-Based Structuring of a Very Large-Scale News Video Corpus
We introduce a topic-based inter-video structuring method that considers application to a very large-scale news video corpus as well as user interfaces that provide the users with the ability to efficiently browse through the corpus based on the topic structure. Although the proposed method is a multimedia-integrated method that refers to both text and image based information, this paper focuse...
متن کاملTopic Threading for Structuring a Large-Scale News Video Archive
We are building a broadcast news video archive where topics of interest can be retrieved and tracked easily. This paper introduces a structuring method applied to the accumulated news videos. First they are segmented into topic units and then threaded according to their mutual relations. A user interface for topic thread-based news video retrieval is also introduced. Since the topic thread stru...
متن کاملAnalyzing News Summaries for Identification of Terrorism Incident Type
In this paper we present experiments for the detection of terrorism incident types from news summary. The news summaries from the global terrorism dataset have been analyzed using machine learning techniques. We have conducted experiments using different learning algorithms including Naive Bayes, decision tree and support vector machine. The results of the experiments show that decision tree le...
متن کاملSemantic analysis of a large-scale news video archive
Figure 1: Broadcast news video archiving system. In this paper, we introduce our recent works on semantic analysis on a large volume of news video data archived for more than five years, equivalent to approximately 900 hours of MPEG-1 video data. After briefly introducing the archive, two works that analyze the news contents based on text and image information are introduced; topic threading an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008